Skip to content

OCPBUGS-66219: fix: kubernetes/conformance suite execution from list and failure propagation#82

Merged
openshift-merge-bot[bot] merged 3 commits intoredhat-openshift-ecosystem:mainfrom
mtulio:spike-420-issues
Dec 5, 2025
Merged

OCPBUGS-66219: fix: kubernetes/conformance suite execution from list and failure propagation#82
openshift-merge-bot[bot] merged 3 commits intoredhat-openshift-ecosystem:mainfrom
mtulio:spike-420-issues

Conversation

@mtulio
Copy link
Contributor

@mtulio mtulio commented Nov 26, 2025

Summary

This PR implements two critical fixes for OCP 4.20+ compatibility and improved CI efficiency:

  1. Failure propagation mechanism to stop dependent plugins when prerequisites fail
  2. Support for consuming extracted Kubernetes conformance tests from init container

Changes

1. Failure Propagation (plugin.go)

Problem: When a conformance plugin fails, subsequent dependent plugins continue running and waste execution time on tests that cannot succeed.

Solution: Added failure detection in RunDependencyWaiter() that:

  • Checks if blocker plugin status is "failed"
  • Returns error to stop dependent plugin execution
  • Exception: Plugin 99 (artifacts-collector) always runs to collect logs

Code change (openshift-tests-plugin/pkg/plugin/plugin.go:714-719):

// Check if the blocker plugin failed and propagate failure to dependent plugins
// Exception: artifacts collector (99-openshift-artifacts-collector) should always run
if pStatusBlocker.Status == "failed" && p.ID() != PluginId99 {
    log.Errorf("Blocker plugin[%s] failed. Propagating failure to dependent plugin[%s]", pluginBlocker, p.Name())
    return fmt.Errorf("blocker plugin %s failed, stopping execution of dependent plugin %s", pluginBlocker, p.Name())
}

Failure propagation chain:

  • 05 (upgrade) → blocks → 10 (kube-conformance)
  • 10 (kube-conformance) → blocks → 20 (conformance-validated)
  • 20 (conformance-validated) → blocks → 80 (replay)
  • 80 (replay) → blocks → 99 (artifacts-collector)

2. Kubernetes Conformance Test Extraction (entrypoint-tests.sh)

Problem: OCP 4.20+ removed the kubernetes/conformance suite from openshift-tests.

Solution: Added logic to consume extracted test list from init container:

  • Checks for /tmp/shared/k8s-conformance-tests.list (created by init container in opct repo)
  • Uses extracted tests if available and non-empty
  • Falls back to default suite if extraction fails or file is missing

Code change (openshift-tests-plugin/plugin/entrypoint-tests.sh:59-76):

# Check if we have extracted k8s conformance tests from OTE
K8S_CONFORMANCE_LIST="/tmp/shared/k8s-conformance-tests.list"
if [[ "${PLUGIN_NAME:-}" == "openshift-kube-conformance" ]] && [[ -f "${K8S_CONFORMANCE_LIST}" ]]; then
    TEST_COUNT=$(wc -l < "${K8S_CONFORMANCE_LIST}")
    if [[ $TEST_COUNT -gt 0 ]]; then
        echo "Using extracted Kubernetes conformance tests from OTE (${TEST_COUNT} tests)"
        cp "${K8S_CONFORMANCE_LIST}" "${CTRL_SUITE_LIST}"
        echo "Tests extracted from k8s-tests-ext binary" > ${CTRL_SUITE_LIST}.log
    else
        # Fallback to default suite
    fi
fi

Benefits

  1. Improved CI efficiency: Stops wasted execution time on dependent tests
  2. Clearer failure signals: Failed prerequisites immediately propagate
  3. OCP 4.20+ support: Works with extracted conformance tests
  4. Backward compatible: Falls back gracefully on older versions
  5. Cleaner code: Uses plugin ID instead of name/alias checks

Related PRs

Testing

Validated with CI rehearsal jobs on OCP 4.20+ clusters.

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 26, 2025
@mtulio mtulio force-pushed the spike-420-issues branch 3 times, most recently from 33c616d to 37f5c1b Compare December 1, 2025 18:40
mtulio added a commit to mtulio/provider-certification-tool that referenced this pull request Dec 1, 2025
Added init container to dynamically extract Kubernetes conformance tests
from the OTE k8s-tests-ext binary to support OCP 4.20+ where the
kubernetes/conformance suite was removed from openshift-tests.

Changes:
- Added extract-k8s-conformance-tests init container
- Extracts k8s-tests-ext.gz from hyperkube image
- Filters conformance tests using jq or grep as fallback
- Saves test list to /tmp/shared/k8s-conformance-tests.list
- Tests container uses extracted list via entrypoint-tests.sh
- Gracefully falls back to default suite if extraction fails

This ensures the kubernetes/conformance suite continues to work on
OCP 4.20+ clusters while maintaining backward compatibility with
earlier versions.

Related: redhat-openshift-ecosystem/provider-certification-plugins#82

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit implements two critical fixes for OCP 4.20+ compatibility:

1. **Failure propagation mechanism**
   - Detects when blocker plugins fail and stops dependent plugins
   - Prevents wasted execution time on tests that can't succeed
   - Exception: artifacts collector (plugin 99) always runs
   - Uses plugin ID check for cleaner, more maintainable logic
   - Location: openshift-tests-plugin/pkg/plugin/plugin.go:716

2. **Kubernetes conformance test extraction support**
   - Consumes extracted test list from init container
   - Checks for /tmp/shared/k8s-conformance-tests.list
   - Falls back to default suite if extraction unavailable
   - Works with init container defined in opct repository
   - Location: openshift-tests-plugin/plugin/entrypoint-tests.sh:59-76

Failure propagation chain:
- 05 (upgrade) → blocks → 10 (kube-conformance)
- 10 (kube-conformance) → blocks → 20 (conformance-validated)
- 20 (conformance-validated) → blocks → 80 (replay)
- 80 (replay) → blocks → 99 (artifacts-collector)
- Plugin 99 (artifacts collector) always runs regardless of failures

Related: redhat-openshift-ecosystem/opct#183

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@mtulio mtulio changed the title fix: implement failure propagation and reduce log verbosity fix: add failure propagation and k8s test extraction support Dec 1, 2025
@mtulio mtulio changed the title fix: add failure propagation and k8s test extraction support OCPBUGS-66219: fix: implement failure propagation and reduce log verbosity Dec 1, 2025
@mtulio mtulio changed the title OCPBUGS-66219: fix: implement failure propagation and reduce log verbosity OCPBUGS-66219: fix: kubernetes/conformance suite execution from list and failure propagation Dec 1, 2025
@mtulio mtulio marked this pull request as ready for review December 1, 2025 20:18
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 1, 2025
mtulio added a commit to mtulio/provider-certification-tool that referenced this pull request Dec 2, 2025
Added init container to dynamically extract Kubernetes conformance tests
from the OTE k8s-tests-ext binary to support OCP 4.20+ where the
kubernetes/conformance suite was removed from openshift-tests.

Changes:
- Added extract-k8s-conformance-tests init container
- Extracts k8s-tests-ext.gz from hyperkube image
- Filters conformance tests using jq or grep as fallback
- Saves test list to /tmp/shared/k8s-conformance-tests.list
- Tests container uses extracted list via entrypoint-tests.sh
- Gracefully falls back to default suite if extraction fails

This ensures the kubernetes/conformance suite continues to work on
OCP 4.20+ clusters while maintaining backward compatibility with
earlier versions.

Related: redhat-openshift-ecosystem/provider-certification-plugins#82

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@mtulio mtulio added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 5, 2025
@jcpowermac
Copy link
Collaborator

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 5, 2025
@openshift-ci
Copy link

openshift-ci bot commented Dec 5, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jcpowermac

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 5, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit 901d102 into redhat-openshift-ecosystem:main Dec 5, 2025
13 checks passed
mtulio added a commit to mtulio/provider-certification-tool that referenced this pull request Dec 5, 2025
Added init container to dynamically extract Kubernetes conformance tests
from the OTE k8s-tests-ext binary to support OCP 4.20+ where the
kubernetes/conformance suite was removed from openshift-tests.

Changes:
- Added extract-k8s-conformance-tests init container
- Extracts k8s-tests-ext.gz from hyperkube image
- Filters conformance tests using jq or grep as fallback
- Saves test list to /tmp/shared/k8s-conformance-tests.list
- Tests container uses extracted list via entrypoint-tests.sh
- Gracefully falls back to default suite if extraction fails

This ensures the kubernetes/conformance suite continues to work on
OCP 4.20+ clusters while maintaining backward compatibility with
earlier versions.

Related: redhat-openshift-ecosystem/provider-certification-plugins#82

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@mtulio mtulio deleted the spike-420-issues branch December 5, 2025 18:57
mtulio added a commit to redhat-openshift-ecosystem/opct that referenced this pull request Dec 5, 2025
## Summary

This PR addresses OCP 4.20+ compatibility issues where the
`kubernetes/conformance` suite was removed from openshift-tests,
requiring dynamic extraction from the k8s-tests-ext binary.
Additionally, it refactors the client architecture to improve code
organization and maintainability.

## Problem

Starting with OCP 4.20, the `kubernetes/conformance` suite is no longer
available in openshift-tests. This causes the kube-conformance plugin to
fail when trying to run conformance tests. Additionally, the client
initialization code was scattered across multiple packages, making it
harder to maintain and test.

## Solution

### 1. Dynamic Test Extraction (OCP 4.20+ Support)

Added an init container (`extract-k8s-conformance-tests`) to the
kube-conformance plugin that:
1. Retrieves the hyperkube image from the cluster release
2. Extracts the k8s-tests-ext.gz binary
3. Runs the binary to list all tests and filters conformance tests
4. Saves the test list to `/tmp/shared/k8s-conformance-tests.list`
5. The tests container uses this list via the entrypoint script

### 2. Suite Name Enforcement by Cluster Version

Added logic to automatically detect cluster version and set the
appropriate suite name:
- **OCP 4.19 and earlier**: Uses `kubernetes/conformance` (default)
- **OCP 4.20+**: Uses `kubernetes/conformance/parallel` (new parallel
suite)

Implementation:
- Added `setSuiteName()` function in `pkg/run/run.go` that:
  - Detects cluster version using the ClusterVersion API
  - Parses version to determine if >= 4.20
  - Sets `suiteNameKubernetesConformance` ConfigMap key accordingly
- Plugin consumes this via `DEFAULT_SUITE_NAME` environment variable

### 3. Client Architecture Refactoring

Consolidated client creation into a unified `Client` struct:

**Before**:
```go
kclient, sclient, err := client.CreateClients()
restConfig, err := client.CreateRestConfig()
```

**After**:
```go
cli, err := client.NewClient()
// Access via cli.KClient, cli.SClient, cli.RestConfig
```

**Benefits**:
- Single source of truth for client initialization
- Easier to pass clients between functions
- Better testability with comprehensive unit tests
- Reduced code duplication

**Files Updated**:
- `pkg/client/client.go` - New unified Client struct
- `pkg/client/client_test.go` - Comprehensive unit tests (24 test
scenarios)
- `pkg/run/run.go` - Updated to use new client architecture
- `pkg/run/validations.go` - Updated function signatures
- `pkg/status/status.go` - Updated to use new client
- `pkg/destroy/destroy.go` - Updated to use new client
- `pkg/cmd/adm/e2ed/taintNode.go` - Updated to use new client

## Changes

### Core Implementation Files

**`data/templates/plugins/openshift-kube-conformance.yaml`**
- Added `extract-k8s-conformance-tests` init container (73 insertions)
- Extracts tests using `jq` with grep fallback for compatibility
- Gracefully handles failures by falling back to default suite
- Consumes `DEFAULT_SUITE_NAME` from ConfigMap for version-specific
suite selection
- Fixed yamllint issues (line length, string continuation)

**`pkg/run/run.go`**
- Added `setSuiteName()` function to detect cluster version and set
suite name
- Updated to use new unified `Client` struct
- Enhanced dry-run mode to print ConfigMaps as JSON
- Fixed nil pointer dereference by correcting variable shadowing
- Added suite name to plugin configuration via ConfigMap

**`pkg/client/client.go`**
- Refactored to use unified `Client` struct with KClient, SClient, and
RestConfig
- Replaced `CreateClients()` with `NewClient()` for better encapsulation
- Made `createRestConfig()` private for better encapsulation

**`pkg/client/client_test.go`** (NEW)
- Added comprehensive unit tests for client creation
- 24 test scenarios covering success and error paths
- Tests for environment variable priority and configuration fallback

### Supporting Files

**`pkg/run/validations.go`**
- Updated function signatures to accept RestConfig as parameter
- Removed dependency on client package for config creation

**`pkg/status/status.go`**
- Updated to use new Client struct
- Fixed missing assignment bug in client initialization

**`pkg/destroy/destroy.go`**
- Updated to use `NewClient()` instead of `CreateClients()`

**`pkg/cmd/adm/e2ed/taintNode.go`**
- Updated to use new client architecture

**`.containerignore` / `.gitignore`**
- Added patterns for better build artifact management

## Backward Compatibility

- ✅ **OCP 4.19 and earlier**: Falls back to `kubernetes/conformance`
suite
- ✅ **OCP 4.20+**: Uses `kubernetes/conformance/parallel` suite
dynamically
- ✅ **Graceful degradation**: If version detection or test extraction
fails, uses default suite
- ✅ **Client refactoring**: Fully backward compatible, no API changes to
external consumers

## Related PRs

- Plugins PR:
redhat-openshift-ecosystem/provider-certification-plugins#82
  - Adds entrypoint script support to consume extracted test list
- Adds `getSuiteName()` function to support `DEFAULT_SUITE_NAME`
environment variable
  - Adds failure propagation to stop dependent plugins on failure

## Testing

### Validation Performed
- ✅ All tests pass: `make test` (30 packages)
- ✅ Go vet clean: `make vet`
- ✅ YAML linting passes: `yamllint`
- ✅ Build successful: `make build`
- ✅ Client unit tests: 24 test scenarios covering all code paths
- ✅ Dry-run mode tested with verbose output

### CI Validation
- Validated with CI rehearsal job on OCP 4.21 cluster
- All linter checks passing (go_lint, yaml)

## Migration Notes

For developers updating code that uses the client package:

**Before**:
```go
kclient, sclient, err := client.CreateClients()
if err != nil {
    return err
}
```

**After**:
```go
cli, err := client.NewClient()
if err != nil {
    return err
}
// Use cli.KClient, cli.SClient, cli.RestConfig
```

---

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants